Andrea Cappozzo andrea.cappozzo@unimi.it
AndreaCappozzo
andreacappozzo.rbind.io
Meme of the day
R packages
The objective of this lecture/lectures is to present the basic structure of an R package and a set of simple but powerful routines that can be used by you to build your R packages.
I will showcase the relevant tools and, at the end, we will build an R package that can be used to decide the optimal date for a happy hour.
What is an R package?
Following the R Package - 2e book, we could say that An R package is the fundamental unit of shareable R code. A package bundles together code, data, documentation, and tests, and is easy to share with others.
An R package can be stored on CRAN (Comprehensive R Archive Network), whereas its development version can be stored on GitHub or other hosting services.
R packages are organized in a standardized format that we must follow. Organizing code always makes your life easier since we can follow a template.
What is an R package? (cont)
A bit of terminology now… A package is a directory of files that extend R, containing, at minimum, the files DESCRIPTION and NAMESPACE and an R/ directory.
A package is not a library.
Beware, maintaining and updating an R package can be an extremely time-consuming process…
Maybe I’ll become a theoretician. Nobody expects you to maintain a theorem. -Doug Bates (Matrix and RcppEigen maintainer, lme4-author)
Peek at the desired product
Now we are going to develop an R package named statsAndBooze. The objective of this package is to find the optimal date for happy hour given a set of constraints.
library(statsAndBooze)beer_dates <-parse_dates(dates =list(andrea =c("2024-11-27", "2024-11-28", "2024-11-29", "2024-11-30", "2024-01-12"), ## available from 27/11 to 1/12federico =c("2024-11-29", "2024-12-02") ## available on 2 days ))decide_happy_hour(beer_dates)#> [1] "2024-11-29"
The chosen path should point to a non-existing directory that will be created by RStudio. Do not store an R package inside another R package or a Git repo.
Let’s start from scratch… (cont)
The previous command should open a new RStudio session that contains the skeleton of an empty R package. We will explore its content in a couple of minutes.
You should also see a log message like
✔ Creating /Users/andrea/Documents/r_packages/statsAndBooze/.✔ Setting active project to "/Users/andrea/Documents/r_packages/statsAndBooze".✔ Creating 'R/'✔ Writing 'DESCRIPTION'Package: statsAndBoozeTitle: What the Package Does (One Line, Title Case)Version:0.0.0.9000Authors@R (parsed):* First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
Now we can analyze the log more precisely together!
Let’s start from scratch… (cont)
The following lists the content of the new directory:
.gitignore: Controls Git versioning.
.Rbuildignore: Similar to .gitignore, but excludes files from the package build.
DESCRIPTION: Stores the metadata of your package (e.g., author, description, dependencies).
R/: The directory where R scripts go.
NAMESPACE: Declares the functions your package exports and the external functions it imports from other packages. DO NOT EDIT BY HAND.
If co-authoring the package, list their names and roles in the Authors field. More details here.
Package dependencies
As mentioned, we’re developing an R package to parse dates and intervals.
Working with dates can be a nightmare… So, we’ll use wrappers from the lubridate package.
When developing an R package, avoid using library() as it only works for interactive sessions. See details here and here.
Instead, use use_package("pkg") to declare dependencies.
Package dependencies (cont)
You should see the following output:
use_package("lubridate")✔ Adding 'lubridate' to Imports field in DESCRIPTION* Refer to functions with `lubridate::fun()`
This message indicates that when using lubridate functions, you should prefix them with lubridate::.
Check the DESCRIPTION file to see the change.
This applies to any function not included in the base package.
Question: How can you determine which package defines a function?
Interactive development
It’s easier to run initial tests in an interactive session before adding them to the package.
Our first goal is to define a function that takes a list of date strings and returns the parsed dates:
parse_dates(dates =list(andrea ="2024-11-29", ## exactly in this formatfederico ="2024-11-30"## exactly in this format ))$andrea[1] "2024-11-29"## should have class = "Date"$federico[1] "2024-11-30"
Now it’s your turn! Try coding this function.
The first function
Now that we have sketched the skeleton of the function, we can add it to our package. First, create an R script in the R/ folder by running use_r("path.R").
For example, run:
use_r("parse.R")✔ Setting active project to '/Users/andrea/Documents/r_packages/statsAndBooze'* Modify 'R/parse.R'* Call `use_test()` to create a matching test file
Copy the function definition (and only the function definition) into the new script. When referring to a lubridate function, prefix it with lubridate::.
#' Parse a list of strings into dates#'#' @details Please note that each date must be specified in the YYYY-MM-DD format.#' @param dates A list of strings specifying dates.#' @return A list of dates. Each string is converted to an object of class Date.#' @export#' @examples#' parse_dates(list("2024-11-29", "2024-11-30"))parse_dates <-function(dates) {lapply(dates, lubridate::as_date)}
Documentation (cont)
Run document() to let roxygen2 generate the documentation.
Check the NAMESPACE file; you should see:
## Generated by roxygen2: do not edit by handexport(parse_dates)
Let’s run the R CMD check again.
If everything works as expected, this is a good time for another commit!
A minimal R package
Now we have a minimal working package! We can install it by running devtools::install() or by using the Build panel.
After installing the package, try this in a fresh R session:
library(statsAndBooze)beer_dates <-list(## We can see that our function works with >= 2 people and >= 2 datesandrea =c("2024-11-29", "2024-11-30"), federico ="2024-11-30",chiara ="2024-11-30")parse_dates(beer_dates)
If there are no errors, our package works 🎉!
To infinity and beyond 🚀
Now it’s time to expand our package! Our objective is to decide a common day for a happy hour, and we’re not doing that yet.
Currently, we’re only parsing input constraints into a list of Date objects. The key step—the organization of the happy hour—is still missing!
As before, it’s convenient to start testing in an interactive session.
decide_happy_hour() function
How would you programmatically determine the common day in the following list?
Finish the documentation by adding examples and completing the DESCRIPTION file. Then, CHECK and commit!
Unit testing
The previous example informally shows that our R package works for a particular case.
Now we want to formalize our expectations into unit tests!
Why do we need unit testing? Two main reasons:
We want to check for wrong inputs or edge cases that may not be obvious to users.
We want to ensure that all functionalities work as expected, even after refactoring.
Unit testing (cont)
After loading devtools, run use_testthat() to set up the unit testing environment:
use_testthat()✔ Setting active project to '/Users/andrea/Documents/r_packages/statsAndBooze'✔ Adding 'testthat' to Suggests field in DESCRIPTION✔ Setting Config/testthat/edition field in DESCRIPTION to '3'✔ Creating 'tests/testthat/'✔ Writing 'tests/testthat.R'* Call `use_test()` to initialize a basic test file and open it for editing.
Next, use use_test(<file>) to create a new test file. For example, use_test("parse"):
use_test("parse")
Unit testing (cont)
Now edit the new file and write your unit test(s). Start by summarizing the objective of the test:
testthat provides several helper functions (expect_length(), expect_message(), expect_error(), etc.) to test aspects of your package (equality, errors, etc.).
Then, write the main part of the test, comparing observed output to our expected result.
After loading the package (load_all()), you can run the new test interactively like any other R function.
Unit testing (cont)
Repeat the same procedure for other functions:
use_test("decide")✔ Setting active project to '/Users/andrea/Documents/r_packages/statsAndBooze'✔ Writing 'tests/testthat/test-decide.R'* Modify 'tests/testthat/test-decide.R'